Bridging the gap between movement data and connectivity analysis using the Time-Explicit Habitat Selection (TEHS) model

Background Understanding how to connect habitat remnants to facilitate the movement of species is a critical task in an increasingly fragmented world impacted by human activities. The identification of dispersal routes and corridors through connectivity analysis requires measures of landscape resistance but there has been no consensus on how to calculate resistance from habitat characteristics, potentially leading to very different connectivity outcomes. Methods We propose a new model, called the Time-Explicit Habitat Selection (TEHS) model, that can be directly used for connectivity analysis. The TEHS model decomposes the movement process in a principled approach into a time and a selection component, providing complementary information regarding space use by separately assessing the drivers of time to traverse the landscape and the drivers of habitat selection. These models are illustrated using GPS-tracking data from giant anteaters (Myrmecophaga tridactyla) in the Pantanal wetlands of Brazil. Results The time model revealed that the fastest movements tended to occur between 8 p.m. and 5 a.m., suggesting a crepuscular/nocturnal behavior. Giant anteaters moved faster over wetlands while moving much slower over forests and savannas, in comparison to grasslands. We also found that wetlands were consistently avoided whereas forest and savannas tended to be selected. Importantly, this model revealed that selection for forest increased with temperature, suggesting that forests may act as important thermal shelters when temperatures are high. Finally, using the spatial absorbing Markov chain framework, we show that the TEHS model results can be used to simulate movement and connectivity within a fragmented landscape, revealing that giant anteaters will often not use the shortest-distance path to the destination patch due to avoidance of certain habitats. Conclusions The proposed approach can be used to characterize how landscape features are perceived by individuals through the decomposition of movement patterns into a time and a habitat selection component. Additionally, this framework can help bridge the gap between movement-based models and connectivity analysis, enabling the generation of time-explicit connectivity results. Supplementary Information The online version contains supplementary material available at 10.1186/s40462-024-00461-1.


Importing the data
The data come from a giant anteater monitored in the Brazilian Pantanal region.Each row in our data set consists of the information regarding one GPS fix.The columns in this dataset are: • timestamp : day and time of GPS fix; and • utm.easting and utm.northing : spatial coordinates of GPS fix.
We start by importing these data.

Calculating time, distance, and Miss columns and cleaning the data
Then, we calculate step length and time interval between consecutive GPS fixes.

Extracting LULC information
To extract the land-use/land-cover (LULC) information along each step, we have to define the projection that our data are on and we have to import the raster file containing the LULC information.At this point, we are ready to begin extracting the LULC information.For each path, we: a) create a line segment; b) define its coordinate system; c) create a 30 m buffer around the line segment; d) extract information from the raster using our buffer; e) tabulate the number of each type of LULC pixel and convert to proportion; and f) store the results.
Notice that, because there are at most 50 LULC categories, we create an empty matrix containing 50 columns and use this matrix to store the results for each path.

Creating potential steps for the TEHS model
The TEHS model, on the other hand, requires the creation of additional steps.In our work, we have relied on the creation of 4 potential steps, one in each cardinal direction and all having the same length as the realized step.Here is how we obtain the coordinates of each one of these potential steps.7 621156.0 7865212 621186.0 7865212 621171 7865227 621171 7865197  ## 8 621068.8 7865227 621275.2 7865227 621172 7865330 621172 7865124  ## 9 621047.3 7865330 621282.7 7865330 621165 7865448 621165 7865212 After the coordinates of the end point for each potential step have been calculated, extracting the corresponding LULC along each potential step relies on code that is similar to the one provided above, except that we change the coordinates that are used when creating the line segment.

Goal
The goal of the time model is to understand how different factors influence the time it takes for an animal to traverse the landscape.Some of these factors might be associated with the animal (e.g., sex, size, and age), the landscape (e.g., amount of vegetation or water), or even more general environmental variables (e.g., temperature).It is important to note that there might be multiple reasons why an animal might spend more time on landscapes with certain characteristics.For example, these characteristics might make it more difficult for the animal to move through the landscape (i.e., resistance).Alternatively, these characteristics might make this patch of landscape ideal foraging, hiding, or resting habitat for the animal, also resulting in slower movements.On the other hand, animals might move faster through places with less resources, with short/open vegetation, and/or with higher perceived risk.

Data
The data come from a giant anteater monitored in the Brazilian Pantanal region.Each row in our data set consists of the information regarding a single step given by this individual: • time1 : the time taken for this step (minutes); • dist1 : the distance traversed (meters) during this step; • Miss : a binary variable indicating if one or more GPS fixes were missed (=1) or not (=0) for this step.This variable was determined based on the time1 variable.For example, if the tracking device is programmed to obtain GPS fixes every 20 minutes but the time interval between two successful GPS fixes was equal to 60 minutes, this means that two fixes (one at 20 minutes and the other at 40 minutes) were missing; • forest0 : proportion of forest in the area surrounding each step; • savanna0 : proportion of savanna in the area surrounding each step; • wetland0 : proportion of wetland in the area surrounding each step; • grass0 : proportion of grassland in the area surrounding each step; • pasture0 : proportion of pasture in the area surrounding each step.
It is important to note that the proportion of each land-use/land-cover (LULC) in the area surrounding each step was calculated by creating a 30-m buffer and determining the proportion of pixels associated with each LULC class.Furthermore, because parameters will be unidentifiable if all LULC classes are included, we remove grasslands/pastures from this data set and use these as the baseline LULC category.

Likelihood and priors
Recall that the likelihood for the time model is given by: For this example, we assume that the mean of this gamma distribution is given by: where D i is the distance traversed by the animal during time step i.Notice that we can solve this expression for a i , yielding: We also allow the precision parameter b i to depend on M iss i (a binary variable equal to 1 if one or more GPS fixes were missed in time step i) through the function: The idea here is that missed GPS fixes can potentially increase the variance given that there is greater uncertainty regarding the true path of the animal.If this is true, then we expect γ 1 to be negative.If no GPS fixes were missed, then one can just drop the subscript from the precision parameter b and assume that it is a constant parameter.
We finish specifying this model by assigning relatively uninformative priors to γ 0 , γ 1 , and β 0 .On the other hand, we use priors for β 1 , β 2 and β 3 that tend to shrink these regression parameters to zero.As a result, our analysis is more likely to be conservative (i.e., more likely to not detect an effect when it is present rather than detect an effect when it is not present).

Running JAGS
The regression parameters that determine mean time consist of: • b0: intercept; • b1: slope associated with proportion of forest; • b2: slope associated with proportion of savanna; and • b3: slope associated with proportion of wetland.
The parameters that govern the precision of the gamma distribution consist of: • g0: intercept for precision; and • g1: slope associated with missed GPS fixes.
Here we specify the settings for JAGS and finally we fit the model.Note that, to fit this model, the user will need to have installed the software JAGS, available at https://sourceforge.net/projects/mcmc-jags/files/, as this software does not come with R. We rely on the R package "jagsUI" to enable R to communicate with JAGS.

Assessing convergence
We assess MCMC convergence by determining if the Rhat statistic is below 1.1 for all parameters.According to the results below, our algorithm has successfully converged.

Interpreting JAGS output
We can plot the posterior distribution of the regression parameters using the code below.This plot shows positive slope coefficients for forest and savanna, indicating that this animal tends to spend more time (i.e., moves more slowly) traversing a landscape with a greater abundance of forest and savannas when compared to grasslands/pastures (i.e., the baseline land cover).On the other hand, the slope was estimated to be negative for wetlands, revealing that this animal tends to move faster in landscapes with wetlands when compared to grasslands/pastures. betas=cbind(mod.results$sims.list$b0,mod.results$sims.list$b1,mod.results$sims.list$b2,mod.results$sims.list$b3)colnames(betas)=paste0( b ,0:3) boxplot(betas[,-1],las=2,names=c( forest , savanna , wetland ),ylab= Slope estimates ) abline(h=0,col= grey ) In this figure, we can see that, the higher the proportion of savanna around the animal, the slower it will move.In this case, we believe that this slower movement is due to savannas being harder to move through when compared to grasslands for giant anteaters.
We can also look at the parameters that govern the precision of the gamma distribution.We expect that missed GPS fixes can potentially increase uncertainty (i.e., decrease the precision) given that there is greater uncertainty regarding the true path of the animal.If this is true, then we expect g1 (the slope for Miss ) to be negative.
Potential steps can be selected in a variety of ways without biasing the results of the model.In this example, we used the actual step length and define four potential steps by assuming that the animal could have moved in the four cardinal directions (i.e., east, west, north, and south from where the step started).Note however that the potential steps do not even have to have the same length as the observed step for our model to work.
We start by importing these data using the code below:

Preparing data for JAGS
We start by storing the LULC covariates for the realized steps in the matrix xmat0 whereas the covariates for the potential steps 1,. . .,4 were stored in the matrices xmat1 ,. . ., xmat4 , respectively.Similarly, we store the feasibility of the realized steps in the vector pmov0 whereas for the potential steps 1,. . .,4, this information was stored in the vectors pmov1 ,. . ., pmov4 , respectively.We also create a vector y comprised of ones to be able to perform the "ones trick" (described below).Finally, we put all this together in a list called dat1 .

Specifying the likelihood of a conditional logistic model
The likelihood of our model is given by: where xmat 0i is a vector that contains the covariates for the realized step i whereas xmat 1i ,. . .,xmat 4i contain the covariates for the matched potential steps 1,. . .,4, respectively.Similarly, pmov 0i is the feasibility of the realized step i whereas pmov 1i ,. . .,pmov 4i are the feasibilities for the matched potential steps 1,. . .,4, respectively.
In JAGS, we can specify this non-standard likelihood using the so-called "ones trick".More specifically, let If we set y i to 1 for i=1,. . .,n, and model this variable using a Bernoulli distribution, the likelihood will be given by the desired equation:

JAGS model
We translate the equations provided above into the JAGS model using the code below.This code is stored in a separate file called "jags_tssf.R".
keep values that really show up colnames(res)=paste0('lulc',1:ncol(res)) sum1=apply(res,2,sum) ind=which(sum1==0) res1=res[,-ind] colnames(res1)=paste0(c('forest','savanna','wetland','grass','pasture'),j) Recall that we rely on the time model to determine the feasibility of reaching different areas (i.e., to determine which areas are indeed available for the animal).For this reason, we import the parameter estimates from the time model to then calculate the feasibility of each step (i.e., calculate the gamma density associated with the realized and potential steps).It is important to note that, instead of explicitly writingβ 0 + β 1 × F orest i + β 2 × Savanna i + β 3 × W etland i ,we rely on linear algebra to perform this calculation for each realized and potential steps.The gamma density evaluated at the realized and potential steps is stored in the matrix prob : #get betas and gs setwd('U:\\timemod\\tutorials\\timemod tutorial\\results') betas=read.csv('betas.csv')gs=read.csv('gs.csv') and savannas when compared to grasslands/pastures (the baseline LULC class).On the other hand, the 95% credible intervals for wetlands encompass zero, indicating that there is no discernible difference in preference between wetlands and grasslands/pastures.